Devising Affordable and Functional Linked Data Archives

نویسندگان

Ruben Verborgh

Miel Vander Sande

Harihar Shankar

Luda Balakireva

Herbert Van de Sompel

چکیده

Linked Data has become an integral part of the Web. Like any other web resource, Linked Data changes over time. Typically, only the most recent version of a Linked Data set can be accessed via Subject-URIs and queried by means of SPARQL. Sometimes, select archived versions are made available for bulk download. This archive access approach is cheap for the publisher but, unfortunately, very expensive for consumers. The entire data dump must be downloaded and ingested into infrastructure that supports subject-URI and/or SPARQL access. Comparing data across different archived versions is even harder. To address this publisher– consumer imbalance, we propose a solution for publication of archived Linked Data that is affordable for publishers and functional for consumers. It consists of two components: a static storage approach for archived Linked Data that exposes a lightweight RDF interface, and the subsequent extension of that interface to versioned data. The Linked Data Fragments (LDF) conceptual framework [2] allows for an analysis of different existing and possible new interfaces to publish RDF data on the Web. Based on insights gained from this framework, we previously designed an interface called Triple Pattern Fragments (TPF) [2], which provides access to data by means of ?s ?p ?o Query-URIs. In contrast to SPARQL endpoints, TPF servers cannot evaluate SPARQL queries, making the maximal per-request processing cost much more limited. To obtain answers to complex SPARQL queries, clients need to execute SPARQL queries locally, only using the server to retrieve triple pattern data. While this makes query evaluation slower and more bandwidth-intensive, the total server cost remains lower. Additionally, because request patterns are more limited, responses are more likely to be cached. Since TPF is a regular HTTP interface, it can be augmented with support for datetime negotiation as defined in the Memento protocol [1]. This allows clients to use the accept-datetime HTTP request header to ask for the responses to ?s ?p ?o Query-URIs and Subject-URIs as they were at a given time in the past. The TPF server replies with the temporally best version, using the memento-datetime HTTP response headers to indicate the archival datetime of the returned representations. As a result, if a client wants to obtain the result of a SPARQL query as it was at a previous point in time, it needs to break down the SPARQL query into the necessary ?s ?p ?o Query-URIs and simply request each using the accept-datetime HTTP request header. In the Linked Data Archive, each temporal version of a Linked Data set is stored according to the HDT (Header Dictionary Triples) format for binary representation of RDF data. HDT files are static, highly compressed, and provide fast triple pattern lookups and estimate result counts. The latter two features are essential for the TPF interface, as they allow breaking SPARQL queries down into multiple ?s ?p ?o Query-URIs. The TPF/HDT combination is attractive for a publisher of archived Linked Data because of its static nature, minimal storage requirement, and constrained query support. It provides attractive functionality for a consumer of the archive through datetime negotiation with Subject-URIs and ?s ?p ?o Query-URIs. Resolving temporal SPARQL queries is possible yet more expensive for the client. But it is far cheaper than downloading one or more data dumps and uploading them into infrastructure that natively supports SPARQL queries. In this presentation, we included a brief refresher of the Memento protocol as it applies to Linked data, and we covered Linked Data Fragments, Triple Pattern Fragments, and the HDT storage format in more detail. We introduced the DBpedia Archive that contains each version of the DBpedia dataset and, in total, consists of over 5 billion RDF triples. To demonstrate the vast potential of the solution, we showed how queries can be executed live over multiple datasets, using examples from the digital libraries domain that were impossible over live Web data before.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward sustainable publishing and querying of distributed Linked Data archives

Purpose This paper details a low-cost, low-maintenance publishing strategy aimed at unlocking the value of Linked Data collections held by libraries, archives and museums. Design/methodology/approach The shortcomings of commonly used Linked Data publishing approaches are identified, and the current lack of substantial collections of Linked Data exposed by libraries, archives and museums is cons...

متن کامل

An optimized affordable DNA-extraction method from Salmonella enterica Enteritidis for PCR experiments

In diagnostic and research bacteriology settings with budget and staff restrictions, fast and cost-effective genome extraction methods are desirable. If not inactivated properly, cellular and/or environmental DNA nucleases will degrade genomic material during the extraction stage, and therefore might give rise to incorrect results in PCR experiments. When crude cell extracts, proteinase K–treat...

متن کامل

Structural underpinnings of functional plasticity in rodent visual cortex.

Functional plasticity in rodent visual cortex has been intensively studied since the pioneering experiments of Hubel and Wiesel in the sixties. Nevertheless, the structural modifications underlying this phenomenon remain elusive. In this article, we will review recent data focused on the dynamic of excitatory and inhibitory synapses and their structural changes linked to functional modification...

متن کامل

شناسایی روابط کتابشناختی در فهرست کتابخانه ملی ایران مبتنی بر الگوی ملزومات کارکردی پیشینه‌های کتابشناختی (اف آر بی آر): گام نخست در بازنمون شبکه دانش انتشارات ایرانی-اسلامی

The aim of this study is to find out the bibliographic relationships between the metadata records in the National Library and Archives of Iran (NLAI) according to FRBR model, in order to represent the Knowledge network of Iranian-Islamic publications. To achieve this objective, the content analysis method was used. The study population includes metadata records for books in NLAI for four biblio...

متن کامل

A Linked Open Data Architecture for Contemporary Historical Archives

This paper presents an architecture for historical archives maintenance based on Open Linked Data technologies and open source distributed development model and tools. The proposed architecture is being implemented for the archives of the Center for Teaching and Research in the Social Sciences and Contemporary History of Brazil (CPDOC) from Getulio Vargas Foundation (FGV).

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

TCDL Bulletin

دوره 13 شماره

صفحات -

تاریخ انتشار 2017

Devising Affordable and Functional Linked Data Archives

نویسندگان

چکیده

منابع مشابه

Toward sustainable publishing and querying of distributed Linked Data archives

An optimized affordable DNA-extraction method from Salmonella enterica Enteritidis for PCR experiments

Structural underpinnings of functional plasticity in rodent visual cortex.

A Linked Open Data Architecture for Contemporary Historical Archives

عنوان ژورنال:

اشتراک گذاری